Search CORE

20 research outputs found

User Interaction Models for Disambiguation in Programming by Example

Author: Grechkin Maxim
Gulwani Sumit
Le Vu
Marron Mark
Mayer Mikaël
Polozov Alex
Singh Rishabh
Soares Gustavo
Zorn Ben
Publication venue
Publication date: 15/09/2015
Field of study

Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though often easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system. We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program. Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Computational Curation of Open Science Data

Author: Grechkin Maxim
Publication venue
Publication date: 01/01/2018
Field of study

Thesis (Ph.D.)--University of Washington, 2018Rapid advances in data collection, storage and processing technologies are driving a new, data-driven paradigm in science. In the life sciences, progress is driven by plummeting genome sequencing costs, opening up new fields of bioinformatics, genomics, and systems biology. The return on the enormous investments into the collection and storage of the data is hindered by a lack of curation, leaving significant portion of the data stagnant and underused. In this dissertation, we introduce several approaches aimed at making open scientific data accessible, valuable, and reusable. First, in the Wide-Open project, we introduce a text mining system for detecting datasets that are referenced in published papers but are still kept private. After parsing over 1.5 million open access publications, Wide-Open has identified hundreds of datasets overdue for publication, 400 of them were then released within one week. Second, we propose a machine learning system, EZLearn, for annotating scientific data into potentially thousands of classes without manual work required to provide training labels. EZLearn is based on an observation that in scientific domains, data samples often come with natural language descriptions meant for human consumption. We take advantage of those descriptions by introducing an auxiliary natural language processing system, training it together with the main classifier in a co-training fashion. Third, we introduce Cedalion, a system that can capture scientific claims from papers, validate them against the data associated with the paper, then generalize and adapt the claims to other relevant datasets in the repository to gather additional statistical evidence. We evaluated Cedalion by applying it to gene expression datasets, and producing reports summarizing the evidence for or against the claim based on the entirety of the collected knowledge in the repository. We find that the claim-based algorithms we propose outperform conventional data integration methods and achieve high accuracy against manually validated claims

DSpace at The University of Washington

Wide Open Science

Author: Bill Howe
Maxim Grechkin
Publication venue
Publication date
Field of study

A paper for “Imagining Tomorrow’s University: Rethinking scholarship, education, and institutions for an open, networked era” workshop

ZENODO

Pathway Graphical Lasso

Author: Fazel Maryam
Grechkin Maxim
Lee Su-In
Witten Daniela
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/02/2015
Field of study

Graphical models provide a rich framework for summarizing the dependencies among variables. The graphical lasso approach attempts to learn the structure of a Gaussian graphical model (GGM) by maximizing the log likelihood of the data, subject to an l1 penalty on the elements of the inverse covariance matrix. Most algorithms for solving the graphical lasso problem do not scale to a very large number of variables. Furthermore, the learned network structure is hard to interpret. To overcome these challenges, we propose a novel GGM structure learning method that exploits the fact that for many real-world problems we have prior knowledge that certain edges are unlikely to be present. For example, in gene regulatory networks, a pair of genes that does not participate together in any of the cellular processes, typically referred to as pathways, is less likely to be connected. In computer vision applications in which each variable corresponds to a pixel, each variable is likely to be connected to the nearby variables. In this paper, we propose the pathway graphical lasso, which learns the structure of a GGM subject to pathway-based constraints. In order to solve this problem, we decompose the network into smaller parts, and use a message-passing algorithm in order to communicate among the subnetworks. Our algorithm has orders of magnitude improvement in run time compared to the state-of-the-art optimization methods for the graphical lasso problem that were modified to handle pathway-based constraints

Association for the Advancement of Artificial Intelligence: AAAI Publications

Wide-Open: Accelerating public data release by automating detection of overdue datasets

Author: Bill Howe (4096402)
Hoifung Poon (4096399)
Maxim Grechkin (2657839)
Publication venue
Publication date: 01/06/2017
Field of study

<div>Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.</div

Directory of Open Access Journals

FigShare

Number of samples in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO).

Author: Bill Howe (4096402)
Hoifung Poon (4096399)
Maxim Grechkin (2657839)
Publication venue
Publication date
Field of study

Data underlying the figure are available as <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002477#pbio.2002477.s001" target="_blank">S1 Data</a>.</p

FigShare

Number of Gene Expression Omnibus (GEO) datasets overdue for release over time, as detected by Wide-Open.

Author: Bill Howe (4096402)
Hoifung Poon (4096399)
Maxim Grechkin (2657839)
Publication venue
Publication date
Field of study

Prior to this submission, we notified GEO of the standing list, which led to the dramatic drop of overdue datasets (magenta portion), with 400 datasets released within the first week. Data underlying the figure are available as <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002477#pbio.2002477.s002" target="_blank">S2 Data</a>.</p

FigShare

Average delay from submission to release in the Gene Expression Omnibus (GEO).

Author: Bill Howe (4096402)
Hoifung Poon (4096399)
Maxim Grechkin (2657839)
Publication venue
Publication date
Field of study

Data underlying the figure are available as <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002477#pbio.2002477.s003" target="_blank">S3 Data</a>.</p

FigShare

Stress-strain state analysis of the leading car body of DPKr-2 diesel train under action of design and operational loads

Author: Alex Grechkin
Alexander Pshinko
Andriy Kuzyshyn
Lyudmila Ursulyak
Maxim Kramarenko
Sergei Kostritsa
Publication venue: 'EDP Sciences'
Publication date: 16/10/2019
Field of study

Purpose.Provision of strength and durability of the main structural element of DPKr-2 diesel train -the leading car body. Methodology. A spatial solid-state 3-D model of the body is built and durability calculations are carried out concerning action of loads stipulated by regulatory documents operating in Ukraine. In particular, the following main estimated modes are considered: mode 1 – a notional safety mode which takes into account the possibility of considerable longitudinal forces arising during shunting movements, transportation and accidental collision; mode 2 – an operational mode which takes into account forces acting on a train during acceleration to constructional speed, coasting or braking from this speed while passing a curve. Results. Based on the results of theoretical and experimental studies a conclusion has been made that the leading car body construction of DPKr-2 diesel train meets the requirements of regulatory documents regarding strength and durability. Practical relevance. A complex of calculation and experimental work concerning assessment of stress-strain state of the leading car body of DPKr-2 diesel train under action of design and operational loads allowed the creation of construction which meets not only operational requirements but also strength and durability ones

EDP Sciences OAI-PMH repository (1.2.0)

Identifying Network Perturbation in Cancer

Author: Andrew J. Gentles (224554)
Benjamin A. Logsdon (178699)
Maxim Grechkin (2657839)
Su-In Lee (2584)
Publication venue
Publication date: 21/02/2016
Field of study

<div>We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed—having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.</div

Directory of Open Access Journals

FigShare